251 research outputs found
Bayesian meta-analysis for identifying periodically expressed genes in fission yeast cell cycle
The effort to identify genes with periodic expression during the cell cycle
from genome-wide microarray time series data has been ongoing for a decade.
However, the lack of rigorous modeling of periodic expression as well as the
lack of a comprehensive model for integrating information across genes and
experiments has impaired the effort for the accurate identification of
periodically expressed genes. To address the problem, we introduce a Bayesian
model to integrate multiple independent microarray data sets from three recent
genome-wide cell cycle studies on fission yeast. A hierarchical model was used
for data integration. In order to facilitate an efficient Monte Carlo sampling
from the joint posterior distribution, we develop a novel Metropolis--Hastings
group move. A surprising finding from our integrated analysis is that more than
40% of the genes in fission yeast are significantly periodically expressed,
greatly enhancing the reported 10--15% of the genes in the current literature.
It calls for a reconsideration of the periodically expressed gene detection
problem.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS300 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Bayesian detection of embryonic gene expression onset in C. elegans
To study how a zygote develops into an embryo with different tissues,
large-scale 4D confocal movies of C. elegans embryos have been produced
recently by experimental biologists. However, the lack of principled
statistical methods for the highly noisy data has hindered the comprehensive
analysis of these data sets. We introduced a probabilistic change point model
on the cell lineage tree to estimate the embryonic gene expression onset time.
A Bayesian approach is used to fit the 4D confocal movies data to the model.
Subsequent classification methods are used to decide a model selection
threshold and further refine the expression onset time from the branch level to
the specific cell time level. Extensive simulations have shown the high
accuracy of our method. Its application on real data yields both previously
known results and new findings.Comment: Published at http://dx.doi.org/10.1214/15-AOAS820 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies
Gene-gene interactions have long been recognized to be fundamentally
important to understand genetic causes of complex disease traits. At present,
identifying gene-gene interactions from genome-wide case-control studies is
computationally and methodologically challenging. In this paper, we introduce a
simple but powerful method, named `BOolean Operation based Screening and
Testing'(BOOST). To discover unknown gene-gene interactions that underlie
complex diseases, BOOST allows examining all pairwise interactions in
genome-wide case-control studies in a remarkably fast manner. We have carried
out interaction analyses on seven data sets from the Wellcome Trust Case
Control Consortium (WTCCC). Each analysis took less than 60 hours on a standard
3.0 GHz desktop with 4G memory running Windows XP system. The interaction
patterns identified from the type 1 diabetes data set display significant
difference from those identified from the rheumatoid arthritis data set, while
both data sets share a very similar hit region in the WTCCC report. BOOST has
also identified many undiscovered interactions between genes in the major
histocompatibility complex (MHC) region in the type 1 diabetes data set. In the
coming era of large-scale interaction mapping in genome-wide case-control
studies, our method can serve as a computationally and statistically useful
tool.Comment: Submitte
Flatness-aware Adversarial Attack
The transferability of adversarial examples can be exploited to launch
black-box attacks. However, adversarial examples often present poor
transferability. To alleviate this issue, by observing that the diversity of
inputs can boost transferability, input regularization based methods are
proposed, which craft adversarial examples by combining several transformed
inputs. We reveal that input regularization based methods make resultant
adversarial examples biased towards flat extreme regions. Inspired by this, we
propose an attack called flatness-aware adversarial attack (FAA) which
explicitly adds a flatness-aware regularization term in the optimization target
to promote the resultant adversarial examples towards flat extreme regions. The
flatness-aware regularization term involves gradients of samples around the
resultant adversarial examples but optimizing gradients requires the evaluation
of Hessian matrix in high-dimension spaces which generally is intractable. To
address the problem, we derive an approximate solution to circumvent the
construction of Hessian matrix, thereby making FAA practical and cheap.
Extensive experiments show the transferability of adversarial examples crafted
by FAA can be considerably boosted compared with state-of-the-art baselines
Monotone Cubic B-Splines
We present a method for fitting monotone curves using cubic B-splines with a
monotonicity constraint on the coefficients. We explore different ways of
enforcing this constraint and analyze their theoretical and empirical
properties. We propose two algorithms for solving the spline fitting problem:
one that uses standard optimization techniques and one that trains a
Multi-Layer Perceptrons (MLP) generator to approximate the solutions under
various settings and perturbations. The generator approach can speed up the
fitting process when we need to solve the problem repeatedly, such as when
constructing confidence bands using bootstrap. We evaluate our method against
several existing methods, some of which do not use the monotonicity constraint,
on some monotone curves with varying noise levels. We demonstrate that our
method outperforms the other methods, especially in high-noise scenarios. We
also apply our method to analyze the polarization-hole phenomenon during star
formation in astrophysics. The source code is accessible at
\texttt{\url{https://github.com/szcf-weiya/MonotoneSplines.jl}}
- …